AITopics | privacy model

Collaborating Authors

privacy model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present \emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {\em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $ε$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.

artificial intelligence, machine learning, privacy guarantee, (18 more...)

arXiv.org Artificial Intelligence

2507.04771

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

ClustEm4Ano: Clustering Text Embeddings of Nominal Textual Attributes for Microdata Anonymization

Aufschläger, Robert, Wilhelm, Sebastian, Heigl, Michael, Schramm, Martin

arXiv.org Artificial IntelligenceDec-17-2024

This work introduces ClustEm4Ano, an anonymization pipeline that can be used for generalization and suppression-based anonymization of nominal textual tabular data. It automatically generates value generalization hierarchies (VGHs) that, in turn, can be used to generalize attributes in quasi-identifiers. The pipeline leverages embeddings to generate semantically close value generalizations through iterative clustering. We applied KMeans and Hierarchical Agglomerative Clustering on $13$ different predefined text embeddings (both open and closed-source (via APIs)). Our approach is experimentally tested on a well-known benchmark dataset for anonymization: The UCI Machine Learning Repository's Adult dataset. ClustEm4Ano supports anonymization procedures by offering more possibilities compared to using arbitrarily chosen VGHs. Experiments demonstrate that these VGHs can outperform manually constructed ones in terms of downstream efficacy (especially for small $k$-anonymity ($2 \leq k \leq 30$)) and therefore can foster the quality of anonymized datasets. Our implementation is made public.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.12649

Country:

Europe > Germany (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

5807a685d1a9ab3b599035bc566ce2b9-Reviews.html

Neural Information Processing SystemsMar-13-2024, 16:46:16 GMT

SUMMARY: This paper is a NIPS-formatted version of an ArXiV manuscript, and uses a Fano/LeCam-style argument to derive a lower bound on estimation algorithms that operate on private data when the algorithm is not trusted by the data holder. As a corollary, randomized response turns out to be an optimal strategy in some sense. As a caveat to this review, I did not go through the supplementary material. This confusion may be exacerbated by statements such as those at the bottom of page 4: "Thus, for suitably large sample sizes n, the effect of providing differential privacy at a level \alpha …" The authors should avoid making such overly broad (and perhaps incorrect) statements when describing their results. In particular, experimental results suggest that \alpha \approx 1 may be the most one can expect for certain learning problems (under differential privacy), so it is unclear the the bound tells us about this case.

differential privacy, privacy, privacy model, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

Ontology for Healthcare Artificial Intelligence Privacy in Brazil

Vaz, Tiago Andres, Dora, José Miguel Silva, Lamb, Luís da Cunha, Camey, Suzi Alves

arXiv.org Artificial IntelligenceApr-16-2023

Using the terminology defined by current legislation, the article outlines a systematic approach to handling hospital data anonymously in preparation for its use in Artificial Intelligence (AI) applications in healthcare. The development process consisted of 7 pragmatic steps, including defining scope, selecting knowledge, reviewing important terms, constructing classes that describe designs used in epidemiological studies, machine learning paradigms, types of data and attributes, risks that anonymized data may be exposed to, privacy attacks, techniques to mitigate re-identification, privacy models, and metrics for measuring the effects of anonymization. The article concludes by demonstrating the practical implementation of this ontology in hospital settings for the development and validation of AI.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.07889

Country:

South America > Brazil (0.52)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

The Limits of Differential Privacy (and Its Misuse in Data Release and Machine Learning)

Communications of the ACMJun-21-2021, 17:35:45 GMT

The traditional approach to statistical disclosure control (SDC) for privacy protection is utility-first. Since the 1970s, national statistical institutes have been using anonymization methods with heuristic parameter choice and suitable utility preservation properties to protect data before release. Their goal is to publish analytically useful data that cannot be linked to specific respondents or leak confidential information on them. In the late 1990s, the computer science community took another angle and proposed privacy-first data protection. In this approach a privacy model specifying an ex ante privacy condition is enforced using one or several SDC methods, such as noise addition, generalization, or microaggregation.

dataset, differential privacy, privacy, (12 more...)

Communications of the ACM

AI-Alerts: 2021 > 2021-06 > AAAI AI-Alert for Jun 22, 2021 (1.00)

Country:

North America > United States (0.48)
Europe > Spain > Catalonia > Tarragona Province > Tarragona (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning from Mixtures of Private and Public Populations

Bassily, Raef, Moran, Shay, Nandi, Anupama

arXiv.org Machine LearningAug-1-2020

We initiate the study of a new model of supervised learning under privacy constraints. Imagine a medical study where a dataset is sampled from a population of both healthy and unhealthy individuals. Suppose healthy individuals have no privacy concerns (in such case, we call their data "public") while the unhealthy individuals desire stringent privacy protection for their data. In this example, the population (data distribution) is a mixture of private (unhealthy) and public (healthy) sub-populations that could be very different. Inspired by the above example, we consider a model in which the population $\mathcal{D}$ is a mixture of two sub-populations: a private sub-population $\mathcal{D}_{\sf priv}$ of private and sensitive data, and a public sub-population $\mathcal{D}_{\sf pub}$ of data with no privacy concerns. Each example drawn from $\mathcal{D}$ is assumed to contain a privacy-status bit that indicates whether the example is private or public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. Prior works in this context assumed a homogeneous population where private and public data arise from the same distribution, and in particular designed solutions which exploit this assumption. We demonstrate how to circumvent this assumption by considering, as a case study, the problem of learning linear classifiers in $\mathbb{R}^d$. We show that in the case where the privacy status is correlated with the target label (as in the above example), linear classifiers in $\mathbb{R}^d$ can be learned, in the agnostic as well as the realizable setting, with sample complexity which is comparable to that of the classical (non-private) PAC-learning. It is known that this task is impossible if all the data is considered private.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Machine Learning

2008.00331

Country:

North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Add feedback

Practical Federated Gradient Boosting Decision Trees

Li, Qinbin, Wen, Zeyi, He, Bingsheng

arXiv.org Machine LearningNov-11-2019

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each owner, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

gradient, hash value, simfl, (15 more...)

arXiv.org Machine Learning

1911.04206

Country: